首页> 外文OA文献 >Achievement tests from an item perspective : an exploration of single item data from the PISA and TIMSS studies, and how such data can inform us about students’ knowledge and thinking in science
【2h】

Achievement tests from an item perspective : an exploration of single item data from the PISA and TIMSS studies, and how such data can inform us about students’ knowledge and thinking in science

机译:从项目角度进行成就测试:从pIsa和TImss研究中探索单项数据,以及这些数据如何告诉我们学生在科学方面的知识和思考

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Summary of chapter 1The thesis was introduced in this chapter by presenting the fundamental rationale for why analysis of items, either one-by-one, or by the study of profiles across a few items, is worthwhile. This rationale was based on a model of how items typically are correlated with each other and to the overall score in an achievement test such as those in TIMSS and PISA. It followed from this model that if we represent the total achievement measure by one overall latent factor,only a small fraction of the variance in the scored items is accounted for by a typical cognitive test score. Furthermore, this argument was brought one step forward by also considering the categorical information in the codes initially used by the markers.Before the variables in the data file are scored, they are nominal variables with codes reflecting qualitative aspects of students’ responses. Taken together with the theoretical model of the scored items, it was concluded that further analysis of the single items would be reasonable, and would involve the analysis of information beyond that contained in the overall score. All the empirical papersin the thesis are based on this rationale: to analyse the surplus information in the items. The purpose of the thesis was then formulated as an exploration into the nature of this surplus information, and the potential of using this information todescribe qualitative differences at the student or the country level. Furthermore, the underlying motivation for doing this was stated as a desire to inform the science education community about the potential for, and limitations of, using the data from LINCAS in secondary research. This latter issue was elaborated and discussed in the next chapter.Summary of chapter 2This chapter gave a broad presentation of LINCAS, their policy relevance, and their link, or lack thereof, to the field of science education research. The chapter consisted of several related elements that, taken together, addressed the issue of why and how researchers in science education could or should engage in analyses of LINCAS.This was done by presenting the historical development of LINCAS, fromthe first IEA studies by the end of the 1950’s to the contemporary studies PISA and TIMSS. I suggested that the development in this period reflects broader societal issues. Moreover, I suggested that the development illustrates a tension or dilemma that LINCAS have been confronted with from the very beginning: LINCAS was initially framed by the idea that international comparisons could bethe basis of a powerful design for studying educational issues. Thus, the main idea driving the genesis of LINCAS (which I labelled Purpose I) was an ambition to utilise the international variation in the study of general educational issues. This research base has been maintained throughout the history of LINCAS. What made it possible to conduct the increasingly more expensive studies was the fact that policy makers evaluated the studies as providers of policy-relevant information. Over the years there has been a shift towards thepurpose of finding evidence for effective policy at the system or national level(which I labelled Purpose II), and the discussion in this chapter demonstrates that this vision for LINCAS is very visible in the PISA study. It would be fair to say that my thesis aims to promote Purpose I, and, furthermore, it aims to promote the view that the tension that is often perceived between the two purposes is tosome degree based on a lack of communication and interaction between the policy makers and the educational researchers. The chapter then turned to a comparison between PISA and TIMSS. This is an issue that in itself is worthwhile because there are some indications thatusers of the information may be confused by discrepant results in the two surveys. However, by examining the differences between the studies, it is evident that the results should not be compared in a simplistic manner: they have different designs targeting different populations and different levels of the school systems, they have defined the achievement measures differently, and even if many countries participate in both studies the composition of the countries in the two studies is clearly not the same. Chapter 2 continued by discussing how science education may be linked to the policy context by engaging in secondary analysis of data and documentsfrom LINCAS. This was not to argue that all, or even most, of the research in science education should be linked to PISA or TIMSS. Nevertheless, a relatively comprehensive review of possibilities for secondary analysis related to LINCAS was presented in the chapter, and the increased potential for such analyses relating to scientific literacy in PISA after the 2006 study was emphasised.Summary of chapter 3Chapter 3 gave an overview of some methodological issues that have heavily influenced my work. It began by placing my work in a tradition that could best Achievement tests from an item perspectivebe labelled as exploratory data analysis. The main idea of this tradition is that when confronted with a data set we should seek to develop a description of the overall structure in the data, the multivariate relationship, which is a challenging task since there is no general procedure to follow for finding such overall patterns in the data. In addition the general issue of the nature of the information in the cognitive items in TIMSS/PISA was explored in this chapter. A novel innovation in TIMSS was the double digit codes and the associated marking rubrics used for the constructed response items. With TIMSS it was acknowledged that using only multiple choice items, which before TIMSS was commonplace in most large-scale assessments, would seriously limit the range of competencies activated by a test. By using open-ended questions, giving students the opportunity to construct their own responses, TIMSS had the ambition of developing descriptions of how students’ represented and made use of concepts in science. The double digit codes were used to preserve that information. This was also the idea in the science assessment of PISA 2000, although the generic system was slightly modified. However, with PISA 2003, and with the items that have undergone field trials before PISA 2006, it is evident that the use of such coding is gradually disappearing. The reason for this change is not entirely clear, but it may be suggested that the codes have been of little use internationally. Nevertheless, constructed response items will still be used since they allow for the testing of competencies other than the selected response formats. The paradoxical consequence of this is that from PISA 2006 more information about students’ thinking and knowledge will be available from analysis of the multiple choice items than from students’ own written accounts of their reasoning and thinking, since the former at least include a code reflecting the response selected by the students. The constructed response items that were originally introduced into these assessments as tools for making the studentsdemonstrate their thinking and reasoning are, in the marking guides for PISA 2006, more or less directly reduced to a description of how to score the items. Even if the marking guide includes explicit descriptions of the criteria for scoring, for the great majority of items there are no longer separate codes for students with different types of responses. I will suggest that this development was perhaps inevitable given that these codes were not extensively used orreported on in the international reports. However, I regard this development as a decrease in the potential for communicating how students typically think and interact with the items in tests like PISA. Furthermore, this development can be viewed as unfortunate from the perspective that such data could possibly be an important resource for secondary analysis aimed at studying students’understanding of very specific scientific concepts or phenomena.Figure 3.3 provided some bipolar characteristics of analyses ofinformation at different levels in item processing from specific written responses, through the coded responses, and finally to the scored items. Information is continuously and consciously peeled off in this process. In the first process of coding, all aspects that are seen as irrelevant for the overall intention of the response are peeled off. This may, for instance, be information regarding errorsin spelling, errors in grammar, and other very specific elements in the response.However, it may also be information that reflects characteristic features of students’ thinking and knowledge. The marking guide has to be understood similarly by all markers, in all countries, and thus, it is a necessary condition that the number of codes are limited, and that they reflect clearly identifiable featuresof students’ responses. The codes therefore represent classes of typical responses that may be distinguished from each other. In the next process, when the items are scored, all aspects other than the overall quality or correctness of the item arepeeled off. The score can therefore be considered as not representing aspects of the responses as such, but rather as representing aspects of the ability that students have used to create their responses. At least this is the idea. However, as demonstrated in Figure 1.1, the score information at the single item level is still highly specific for the item. Furthermore, chapter 3 addressed more specifically the methods used in one of the papers: correspondence and homogeneity analysis. I have so far not seen any other analysis where these, or similar tools, are used to study the relationship between nominally measured cognitive variables. In that sense the work undertaken in this paper represents an innovative approach to the analysis of data from cognitive tests. The aim of this section in chapter 3 was to write about the methods at a level requiring very little mathematics. This was a conscious choice in order to make this part of the text available to a more diverse group of readers. One consequence of this would be that interesting aspects of the methods are not commented on. Furthermore, since the language of mathematics is a useful tool that allows for very precise and unequivocal communication, another unfortunate consequence may be that the text is ambiguous, thus allowing misunderstandings to develop. Nevertheless, writingfor a wider audience has forced me to challenge my own understanding of the methods I have applied.
机译:第1章概述本章介绍了为什么对项目进行逐项分析或研究几个项目的概况值得进行分析的基本原理。该基本原理基于一个模型,该模型通常将项目之间如何相互关联,以及与诸如TIMSS和PISA中的成就测试中的总体得分相关。从该模型得出的结论是,如果我们用一个整体潜在因子来表示总成就测度,则得分项中方差的一小部分就是典型的认知测验得分。此外,通过考虑标记最初使用的代码中的分类信息,这一论点又向前迈进了一步。在对数据文件中的变量进行评分之前,它们是名义变量,其代码反映了学生反应的定性方面。结合计分项目的理论模型,得出的结论是,对单个项目进行进一步分析将是合理的,并且将涉及超出总分所包含信息的分析。本文所有的实证论文都是基于这样的理由:分析项目中的剩余信息。然后,论文的目的被阐述为对这种剩余信息的性质的探索,以及利用这种信息来描述学生或国家层面的质性差异的潜力。此外,这样做的潜在动机被认为是希望向科学教育界通报在二级研究中使用LINCAS数据的潜力和局限性。下一个问题将在下一章中进行阐述和讨论。第二章概述本章对LINCAS,其政策相关性以及它们与科学教育研究领域之间的联系或缺乏联系进行了广泛介绍。本章由几个相关要素组成,共同解决了为什么科学教育研究人员为什么以及如何能够或应该从事LINCAS分析的问题,这是通过介绍LINCAS的历史发展(从IEA的首次研究到最后)来完成的。 1950年代的当代研究PISA和TIMSS。我认为,这一时期的发展反映了更广泛的社会问题。此外,我认为这种发展说明了LINCAS从一开始就面临的紧张或困境:LINCAS最初的构想是,国际比较可以作为研究教育问题的强大设计的基础。因此,推动LINCAS产生的主要思想(我将其标记为“目的I”)是在国际通识教育问题研究中利用国际差异的雄心。该研究基础在LINCAS的整个历史中一直保持着。使得进行越来越昂贵的研究成为可能的是,决策者将研究评估为与政策相关的信息的提供者。多年来,人们已经朝着在系统或国家层面寻找有效政策证据的目的(我将其标记为“目的II”)进行了转移,本章中的讨论表明,在PISA研究中,对LINCAS的愿景非常明显。可以公平地说,我的论文旨在促进目标一,此外,它旨在促进这样一种观点,即两个目标之间经常被感知到的紧张在一定程度上是基于政策之间缺乏沟通和互动的缘故制造商和教育研究人员。然后,本章转向PISA和TIMSS之间的比较。这是一个本身值得解决的问题,因为有迹象表明,两次调查的结果不一致可能会使信息用户感到困惑。但是,通过检查研究之间的差异,很明显,不应以简单的方式对结果进行比较:针对不同的人群和学校系统的不同水平,它们具有不同的设计,对成就的衡量标准也有不同的定义,甚至如果许多国家都参加两项研究,那么两项研究中的国家组成显然是不同的。第二章继续讨论如何通过对LINCAS的数据和文档进行二次分析来将科学教育与政策环境联系起来。这并不是说科学教育中的全部或大部分研究都应与PISA或TIMSS相关联。然而,本章对与LINCAS相关的二次分析的可能性进行了较为全面的回顾。,并着重强调了2006年研究后在PISA中进行与科学素养相关的此类分析的潜力。第三章第三章概述了一些严重影响我工作的方法论问题。首先,将我的工作置于可以从项目的角度最好地进行成就测试的传统中,该传统被称为探索性数据分析。这种传统的主要思想是,当面对数据集时,我们应该寻求发展数据整体结构的描述,即多元关系,这是一项具有挑战性的任务,因为没有通用的程序可以找到这种整体数据中的模式。此外,本章还探讨了TIMSS / PISA中认知项目中信息性质的一般性问题。 TIMSS的一项创新是双数字代码和用于构造响应项目的相关标记专栏。对于TIMSS,公认的是,仅使用多项选择项(在TIMSS之前在大多数大规模评估中是司空见惯的)会严重限制测试激活的能力范围。通过使用开放式问题,使学生有机会构建自己的答案,TIMSS的雄心壮志是描述学生如何表示和利用科学中的概念。两位数代码用于保存该信息。尽管对通用系统进行了少许修改,但这也是PISA 2000的科学评估中的想法。但是,对于PISA 2003,以及在PISA 2006之前经过现场测试的项目,很明显,这种编码的使用正在逐渐消失。进行此更改的原因尚不完全清楚,但可能有人建议该代码在国际上很少使用。但是,由于构建的响应项允许测试除所选响应格式以外的能力,因此仍将使用构建的响应项。自相矛盾的结果是,从PISA 2006开始,比起学生自己关于推理和思维的书面记录,从多项选择项的分析中可以获得更多有关学生的思维和知识的信息,因为前者至少包括反映学生选择的答案。在PISA 2006的评分指南中,最初作为这些评估工具而引入的构建的响应项目是用来让学生展示他们的思维和推理的工具,或多或少直接减少了对如何对项目进行评分的描述。即使评分指南中包含评分标准的明确说明,但对于大多数项目,不再有针对具有不同回答类型的学生的单独代码。我认为,鉴于这些守则并未在国际报告中得到广泛使用或报告,这种发展也许是不可避免的。但是,我认为这种发展减少了交流学生通常如何思考和与PISA等测试中的项目互动的可能性。此外,从这样的数据可能是辅助分析的重要资源的角度来看,这种发展可能是不幸的,其目的在于研究学生对非常具体的科学概念或现象的理解。图3.3提供了不同层次信息分析的一些双极性特征在项目处理中,从特定的书面答复到编码答复,再到评分项目。在此过程中,信息不断被有意识地剥离。在编码的第一个过程中,所有与响应的总体意图无关的方面都将被剥离。例如,这可能是有关拼写错误,语法错误以及响应中其他非常具体元素的信息。但是,也可能是反映学生思维和知识特征的信息。在所有国家/地区,所有标记人员都必须类似地理解标记指南,因此,有必要条件是限制代码的数量,并且它们必须能够清楚地反映出学生反应的可识别特征。因此,这些代码代表可以彼此区分的典型响应类别。在下一个过程中,对项目进行评分时,将剥离除项目的整体质量或正确性以外的所有方面。因此,该分数可被视为不代表回答本身的各个方面,而是代表学生用来创建自己的回答的能力的各个方面。至少这是一个主意。但是,如图1.1所示,单个项目级别的分数信息仍然是针对该项目的高度特定的。此外第3章更具体地介绍了其中一篇论文中使用的方法:对应性和同质性分析。到目前为止,我还没有看到任何其他分析将这些或类似工具用于研究名义上衡量的认知变量之间的关系的分析。从这个意义上讲,本文中的工作代表了一种创新方法,用于分析认知测验的数据。第3章本节的目的是在只需要很少数学的水平上介绍这些方法。这是一个有意识的选择,目的是使本书的这一部分提供给更多的读者。其结果之一是该方法的有趣方面未得到评论。此外,由于数学语言是允许进行非常精确和明确的交流的有用工具,因此另一个不幸的结果可能是文本含糊不清,从而引起误解。尽管如此,为更多的读者写作却迫使我挑战自己对所采用方法的理解。

著录项

  • 作者

    Olsen, Rolf Vegar;

  • 作者单位
  • 年度 2005
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号